%-*- lang : icon -*- \documentstyle [noweb,11pt,fullpage] {article} \pagestyle {noweb} \title {Extending Noweb With Some Typesetting (Icon version)} \author {Kostas N. Oikonomou \\ {\tt ko@surya.ho.att.com}} \begin {document} \maketitle \section {Introduction} This is a {\tt noweb} filter, written in Icon, which adds to {\tt noweave} the capability of some simple pretty-printing (no indentation or line-breaking!) in code sections. This particular version implements pretty-printing for the Icon language. However, the filter is written in a way that should make it easy for someone to change the target language: most of the code is language-independent (in \S\ref{lipp} and on), while the language-dependent code occupies only sections 1 and 2. In fact, what needs to be changed when the language changes is only the [[translation]] table in procedure [[main]], and the definitions of the ``interesting'' tokens in the beginning of \S\ref{int}\footnote{Well, hopefully, anyway.}. Using this two-part scheme, the language-independent part (which is in file [[lipp.nw]]) has been used with the language-dependent files [[tnw.nw]], [[inw.nw]], and [[mnw.nw]] to implement pretty-printing for the languages Object-Oriented Turing, Icon, and {\sl Mathematica}. <<*>>= <> <> <> <> \section {A Typesetting Facility} \subsection {The philosophy} The addition to {\tt noweb} described here is based on the following two premises \begin {itemize} \item It should be as independent of the target language as possible, and \item We don't want to write a full-blown scanner for the target language. \end {itemize} Strings of characters of the target language which we want to typeset specially are called ``interesting tokens''. Having had some experience with Web and SpiderWeb, we define three categories of interesting tokens: \begin {enumerate} \item Reserved words of the target language: we want to typeset them in bold, say. \item Other strings that we want to typeset specially: e.g. $\le$ for [[<=]]. \item Comment and quoting characters: we want what follows them or what is enclosed by them to be typeset literally. \end {enumerate} There is a table [[translation]] which defines a translation into \TeX\ code for every interesting token in the target language. Here is an excerpt from the translation table for Icon: \begin {center} \begin {tabular}{l} [[translation["by"] := "{\\Cb{}by}"]] \\ [[translation["break"] := "{\\Cb{}break}"]] \\ [[translation["&ascii"] := "{\\Cb{}&ascii}"]] \\ [[translation["&clock"] := "{\\Cb{}\&clock}"]] \\ [[translation[">="] := "$\\ge$"]] \\ [[translation["~="] := "$\\neq$"]] \end {tabular} \end {center} (Here the control sequence \verb+\Cb+ selects the Courier bold font\footnote{The empty group \{\} serves to separate the control sequence from its argument without introducing an extra space.}.) We use four sets of strings to define the tokens in categories 2 and 3: \begin {center} [[special]], [[comment1]], [[comment2]], [[quote]]. \end {center} [[comment1]] is for unbalanced comment strings (e.g.\ the character [[#]] in Icon), [[comment2]] is for balanced comment strings (none in Icon), and [[quote]] is for literal quotes ([["]] and [[']] in Icon), which we assume to be balanced. Our approach to recognizing the interesting tokens while scanning a line, is to have a set of characters [[begin_token]] (an Icon cset), containing all the characters by which an interesting token may begin. [[begin_token]] is the union of \begin {itemize} \item the cset defining the characters which may begin a reserved word, and \item the cset containing the initial characters of all strings in the special, comment, and quote sets. \end {itemize} Given a line of text, we scan up to a character in [[begin_token]], and, depending on what this character is, we may try to complete the token by further scanning. If we succeed, we look up the token in the [[translation]] table, and if the token is found, we output its translation, otherwise we output the token itself unchanged. When comment or quote tokens are recognized, further processing of the line may stop altogether, or temporarily, until a matching token is found. <>= procedure main (args) <> <> <> <> \subsection {Definitions of the interesting tokens} \label {int} The set of characters allowed in an Icon identifier, of which a reserved word is a special case: <>= res_word_chars := &letters ++ '&$' id_chars := res_word_chars ++ &digits @ Unbalanced and balanced comment tokens, and quoting tokens. Note that [[comment2]] is a set of {\it pairs} (``open comment'', ``close comment''), while we assume that quoting tokens are s.t. the ``open quote'' and ``close quote'' tokens are identical. <>= comment1 := set(["#"]) comment2 := set([]) quote := set(["\"", "\'"]) @ The ``special'' tokens. [[S]] is the set of all characters appearing in strings in [[special]]. <>= special := set(["{", "}", "\\", "||", "<", ">", ">=", "<=", "=>", "~=", "++", "**", "--"]) S := '' every S := S ++ !special # Nice! The rest of the code is language-independent, and is in the file {\tt lipp.nw} (Language-Independent Pretty-Printing).